VAD-measure-embedded decoder with online model adaptation
نویسندگان
چکیده
We previously proposed a decoding method for automatic speech recognition utilizing hypothesis scores weighted by voice activity detection (VAD)-measures. This method uses two Gaussian mixture models (GMMs) to obtain confidence measures: one for speech, the other for non-speech. To achieve good search performance, we need to adapt the GMMs properly for input utterances and environmental noise. We describe a new unsupervised on-line GMM adaptation method based on MAP estimation. The robustness of our method is further improved by weighting updating parameters of GMMs according to the confidence measure for the adaptation data. We also describe an approach to accelerate the adaptation by caching statistical values to adapt GMMs. Experimental results on Drivers’ Japanese Speech Corpus in a Car Environment (DJSC) show that the adaptation with decoding method significantly improves the word accuracy from 54.8% to 59.6%. Moreover, the weighting method improves the robustness of the unsupervised adaptation, and the cache method greatly accelerates the decoding process. Consequently, our adaptive decoding method significantly improves the word accuracy in a noisy environment with only a minor increase in the computational cost.
منابع مشابه
Robust speech recognition using VAD-measure-embedded decoder
In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or statistical models. These approaches discard...
متن کاملMixed decision-based noise adaptation for speech enhancement
Introduction: Frequency domain speech enhancement is focused mainly on improved estimation of spectral attenuation factors with the assumption of given noise statistics. However, in practice, the noise statistics exhibit fluctuations from frame to frame. Thus, a method for robust estimation of the noise statistics is investigated in this Letter. Conventional noise estimation can be classified i...
متن کاملProficient BMI Control Enabled by Closed-Loop Adaptation of an Optimal Feedback-Controlled Point Process Decoder
Much progress has been made in brain-machine interface (BMI) development using closed-loop decoder adaptation (CLDA) methods. CLDA fits the decoder parameters during closed-loop BMI operation based on the neural activity and inferred user velocity intention. This progress has resulted in the recent high-performance ReFIT Kalman filter (ReFIT KF) [1]. Here we develop an adaptive optimal feedback...
متن کاملVoice Activity Detection Using Speech Recognizer Feedback
This paper demonstrates how feedback from a speech recognizer can be leveraged to improve Voice Activity Detection (VAD) for online speech recognition. First, reliably transcribed segments of audio are fed back by the recognizer as supervision for VAD model adaptation. This allows the much stronger LVCSR acoustic models to be harnessed without adding computation. Second, when to make a VAD deci...
متن کاملVoice Activity Detection Based on Discriminative Weight Training Incorporating a Spectral Flatness Measure
In this paper, we present an approach to incorporate discriminative weight training into a statistical model-based voice activity detection (VAD) method. In our approach, the VAD decision rule is derived from the optimally weighted likelihood ratios (LRs) using a minimum classification error (MCE) method. An adaptive online means of selecting two kinds of weights based on a power spectral flatn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010